48 research outputs found

    A Cyberpunk 2077 perspective on the prediction and understanding of future technology

    Full text link
    Science fiction and video games have long served as valuable tools for envisioning and inspiring future technological advancements. This position paper investigates the potential of Cyberpunk 2077, a popular science fiction video game, to shed light on the future of technology, particularly in the areas of artificial intelligence, edge computing, augmented humans, and biotechnology. By analyzing the game's portrayal of these technologies and their implications, we aim to understand the possibilities and challenges that lie ahead. We discuss key themes such as neurolink and brain-computer interfaces, multimodal recording systems, virtual and simulated reality, digital representation of the physical world, augmented and AI-based home appliances, smart clothing, and autonomous vehicles. The paper highlights the importance of designing technologies that can coexist with existing preferences and systems, considering the uneven adoption of new technologies. Through this exploration, we emphasize the potential of science fiction and video games like Cyberpunk 2077 as tools for guiding future technological advancements and shaping public perception of emerging innovations.Comment: 12 pages, 7 figure

    Journal of Real-Time Image Processing manuscript No. (will be inserted by the editor) Evaluation of real-time LBP computing in multiple architectures

    Get PDF
    Abstract Local Binary Pattern (LBP) is a texture operator that is used in several different computer vision applications requiring, in many cases, real-time operation in multiple computing platforms. The irruption of new video standards has increased the typical resolutions and frame rates, which need considerable computational performance. Since LBP is essentially a pixel operator that scales with image size, typical straightforward implementations are usually insufficient to meet these requirements. To identify the solutions that maximize the performance of the real-time LBP extraction, we compare a series different implementations in terms of computational performance and energy efficiency while analyzing the different optimizations that can be made to reach real-time performance on multiple platforms and their different available computing resources. Our contribution addresses the extensive survey of LBP implementations in different platforms that can be found in the literature. To provide for a more complete evaluation, we have implemented the LBP algorithms in several platforms such as Graphics Processing Units, mobile processors and a hybrid programming model image coprocessor. We have extended the evaluation of some of the solutions that can be found in previous work. In addition, we publish the source code of our implementations

    Kinship Verification from Videos using Spatio-Temporal Texture Features and Deep Learning

    Full text link
    Automatic kinship verification using facial images is a relatively new and challenging research problem in computer vision. It consists in automatically predicting whether two persons have a biological kin relation by examining their facial attributes. While most of the existing works extract shallow handcrafted features from still face images, we approach this problem from spatio-temporal point of view and explore the use of both shallow texture features and deep features for characterizing faces. Promising results, especially those of deep features, are obtained on the benchmark UvA-NEMO Smile database. Our extensive experiments also show the superiority of using videos over still images, hence pointing out the important role of facial dynamics in kinship verification. Furthermore, the fusion of the two types of features (i.e. shallow spatio-temporal texture features and deep features) shows significant performance improvements compared to state-of-the-art methods.Comment: 7 page

    Improving Depression estimation from facial videos with face alignment, training optimization and scheduling

    Full text link
    Deep learning models have shown promising results in recognizing depressive states using video-based facial expressions. While successful models typically leverage using 3D-CNNs or video distillation techniques, the different use of pretraining, data augmentation, preprocessing, and optimization techniques across experiments makes it difficult to make fair architectural comparisons. We propose instead to enhance two simple models based on ResNet-50 that use only static spatial information by using two specific face alignment methods and improved data augmentation, optimization, and scheduling techniques. Our extensive experiments on benchmark datasets obtain similar results to sophisticated spatio-temporal models for single streams, while the score-level fusion of two different streams outperforms state-of-the-art methods. Our findings suggest that specific modifications in the preprocessing and training process result in noticeable differences in the performance of the models and could hide the actual originally attributed to the use of different neural network architectures.Comment: 5 page

    Audio-Based Classification of Respiratory Diseases using Advanced Signal Processing and Machine Learning for Assistive Diagnosis Support

    Full text link
    In global healthcare, respiratory diseases are a leading cause of mortality, underscoring the need for rapid and accurate diagnostics. To advance rapid screening techniques via auscultation, our research focuses on employing one of the largest publicly available medical database of respiratory sounds to train multiple machine learning models able to classify different health conditions. Our method combines Empirical Mode Decomposition (EMD) and spectral analysis to extract physiologically relevant biosignals from acoustic data, closely tied to cardiovascular and respiratory patterns, making our approach apart in its departure from conventional audio feature extraction practices. We use Power Spectral Density analysis and filtering techniques to select Intrinsic Mode Functions (IMFs) strongly correlated with underlying physiological phenomena. These biosignals undergo a comprehensive feature extraction process for predictive modeling. Initially, we deploy a binary classification model that demonstrates a balanced accuracy of 87% in distinguishing between healthy and diseased individuals. Subsequently, we employ a six-class classification model that achieves a balanced accuracy of 72% in diagnosing specific respiratory conditions like pneumonia and chronic obstructive pulmonary disease (COPD). For the first time, we also introduce regression models that estimate age and body mass index (BMI) based solely on acoustic data, as well as a model for gender classification. Our findings underscore the potential of this approach to significantly enhance assistive and remote diagnostic capabilities.Comment: 5 pages, 2 figures, 3 tables, Conference pape

    Natural course of septo-optic dysplasia: Retrospective analysis of 20 cases

    Full text link
    Introducción. La displasia septoóptica (DSO) es la combinación variable de signos de disgenesia de línea media cerebral, hipoplasia de nervios ópticos y disfunción hipotálamo-hipofisaria, asociándose, a veces, con un espectro variado de malformaciones de la corteza cerebral. Objetivo. Describir la evolución natural y los hallazgos de neuroimagen en una serie de 20 pacientes diagnosticados. Pacientes y métodos. Se revisan de forma retrospectiva las características epidemiológicas, clínicas y neurroradiológicas de 20 pacientes consecutivos diagnosticados de DSO entre enero de 1985 y enero de 2010. Se analizaron los datos de tomografía computarizada, resonancia magnética craneal, electroencefalograma, potenciales evocados visuales, valoración oftalmológica, cariotipo y estudio endocrinológico. En siete pacientes, se realizó estudio del gen Homeobox HESX1. Resultados. El 60% de los casos presentaba antecedentes patológicos en el primer trimestre de gestación, con las ecografías fetales normales. Clínicamente, destacaban manifestaciones visuales (85%), alteraciones endocrinas (50%), retraso mental (60%) y crisis epilépticas (55%). Un 55% se asociaba a anomalías de migración neuronal. En un 45%, la DSO era el único hallazgo de neuroimagen. Se realizó cariotipo a todos, siendo normal. El gen HESX1 fue positivo en dos de los siete casos estudiados (ambos con DSO aislada). Ninguno con mutación en el gen HESX1 presentaba consanguinidad familiar. No se realizó estudio genético a los padres. Conclusiones. La DSO debe clasificarse como un síndrome malformativo heterogéneo, que asocia múltiples anomalías cerebrales, oculares, endocrinas y sistémicas. Las formas más graves se asocian con anomalías de la migración neuronal y de la organización cortical (AU)Introduction. Septo-optic dysplasia (SOD) is the variable combination of signs of dysgenesis of the midline of the brain, hypoplasia of the optic nerves and hypothalamus-pituitary dysfunction, which is sometimes associated with a varied spectrum of malformations of the cerebral cortex. Aims. To describe the natural history and neuroimaging findings in a series of 20 diagnosed patients. Patients and methods. We review the epidemiological, clinical and neuroimaging characteristics of 20 consecutive patients diagnosed with SOD between January 1985 and January 2010. Data obtained from computerised tomography, magnetic resonance imaging of the head, electroencephalogram, visual evoked potentials, ophthalmological evaluation, karyotyping and endocrinological studies were analysed. In seven patients, a study of the gene Homeobox HESX1 was conducted. Results. Pathological antecedents in the first three months of gestation were presented by 60% of the cases, with normal results in the foetal ultrasound scans. Clinically, the most striking features were visual manifestations (85%), endocrine disorders (50%), mental retardation (60%) and epileptic seizures (55%). Fifty-five per cent were associated to abnormal neuronal migration. In 45%, SOD was the only finding in the neuroimaging scans. Karyotyping was performed in all cases, the results being normal. Gene HESX1 was positive in two of the seven cases studied (both with isolated SOD). None of those with mutation in gene HESX1 presented familial consanguinity. No gene study was conducted with the parents. Conclusions. SOD must be classified as a heterogeneous malformation syndrome, which is associated to multiple brain, ocular, endocrine and systemic anomalies. The most severe forms are associated with abnormal neuronal migration and cortical organisation (AU

    Introducing VTT-ConIot: A Realistic Dataset for Activity Recognition of Construction Workers Using IMU Devices

    Get PDF
    Sustainable work aims at improving working conditions to allow workers to effectively extend their working life. In this context, occupational safety and well-being are major concerns, especially in labor-intensive fields, such as construction-related work. Internet of Things and wearable sensors provide for unobtrusive technology that could enhance safety using human activity recognition techniques, and has the potential of improving work conditions and health. However, the research community lacks commonly used standard datasets that provide for realistic and variating activities from multiple users. In this article, our contributions are threefold. First, we present VTT-ConIoT, a new publicly available dataset for the evaluation of HAR from inertial sensors in professional construction settings. The dataset, which contains data from 13 users and 16 different activities, is collected from three different wearable sensor locations.Second, we provide a benchmark baseline for human activity recognition that shows a classification accuracy of up to 89% for a six class setup and up to 78% for a sixteen class more granular one. Finally, we show an analysis of the representativity and usefulness of the dataset by comparing it with data collected in a pilot study made in a real construction environment with real workers

    Designing for energy-efficient vision-based interactivity on mobile devices

    No full text
    Abstract Future multimodal mobile platforms are expected to require high interactivity in their applications and user interfaces. Until now, mobile devices have been designed to remain in a stand-by state until the user actively turns it on in the interaction sense. The motivation for this approach has been battery conservation. Imaging is a versatile sensing modality that can enable context recognition, unobtrusively predicting the user's interaction needs and directing the computational resources accordingly. However, vision-based always-on functionalities have been impractical in battery-powered devices, since their requirements of computational power and energy make their use unattainable for extended periods of time. Vision-based applications can benefit from the addition of interactive stages that, properly designed, can reduce the complexity of the methods utilizing user feedback and collaboration, resulting in a system that balances computational throughput and energy efficiency. The usability of user interfaces critically rests on their latency. However, an always-on sensing platform needs a careful balance with the power consumption demands. Improving reactiveness when designing for highly interactive vision-based interfaces can be achieved by reducing the number of operations that the application processor needs to execute, deriving the most expensive tasks to accelerators or specific processors. In this context, this thesis focuses on investigating and surveying enablers and solutions for vision-based interactivity on mobile devices. The thesis explores the development of new user interaction methods by analyzing and comparing means to reach interactivity, high performance, low latency and energy efficiency. The researched techniques, ranging from mobile GPGPU and dedicated sensor processing to reconfigurable image processors, provide understanding on designing for future mobile platforms.Tiivistelmä Tulevaisuuden multimodaalisten mobiilialustojen sovellusten ja käyttöliittymien odotetaan vaativan käyttäjältä läheistä vuorovaikutusta. Tähän saakka mobiililaitteet on suunniteltu pysymään valveustilassa siihen asti kunnes käyttäjä aktivoi laitteen. Tällä lähestymistavalla on pyritty pidentämään akun kestoa. Kuvantaminen on monipuolinen aistimodaliteetti, joka mahdollistaa kontekstin tunnistuksen ennakoimalla huomaamattomasti käyttäjän vuorovaikutustarpeet ja suuntaamalla laskennalliset resurssit asianmukaisesti. Näköpohjaiset, jatkuvasti päällä olevat toiminnot ovat kuitenkin epäkäytännöllisiä akkukäyttöisissä laitteissa sillä niiden laskennallisen suoritustehokkuuden ja akun keston vaatimukset tekevät pidemmästä yhtäjaksoisesta käytöstä mahdotonta. Kamerapohjaiset sovellukset voivat hyötyä interaktiivisten vaiheiden lisäämisestä. Oikein suunniteltuina ne vähentävät käyttäjäpalautetta ja -yhteistyötä hyödyntävien menetelmien monimutkaisuutta, joka saattaa laskennallisen suoritustehokkuuden ja energiatehokkuuden tasapainoon. Käyttöliittymien käytettävyys on kriittisesti riippuvainen niiden viiveestä. Jatkuvasti päällä oleva aistiva alusta edellyttää kuitenkin tasapainottelua virrankulutuksen vaatimusten kanssa. Hyvin interaktiivisia kamerapohjaisia käyttöliittymiä suunniteltaessa reaktiivisuuden parantaminen saadaan aikaan vähentämällä prosessorin käsittelemien operaatioiden määrää, johtamalla kuormittavimmat tehtävät kiihdyttimille tai erillisille prosessoreille. Tässä kontekstissa, väitöskirjatutkimus keskittyy tutkimaan ja tarkastelemaan mahdollistajia ja ratkaisuja kamerapohjaiseen vuorovaikutukseen mobiililaitteissa. Väitöskirja tutkii uusien käyttäjäinteraktiomenetelmien kehittämistä vuorovaikutusta, suoritustehoa, alhaista viivettä ja energiatehokkuutta tuottavia keinoja analysoimalla ja vertaamalla. Tutkitut tekniikat mobiilista grafiikkaprosessoreista ja erillis sensoriprosessoinnista uudelleen konfiguroitaviin kuvaprosessoreihin tuovat ymmärrystä tulevaisuuden mobiilien alustojen suunnitteluun

    Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces

    Full text link
    Photoplethysmography (PPG) signals have become a key technology in many fields such as medicine, well-being, or sports. Our work proposes a set of pipelines to extract remote PPG signals (rPPG) from the face, robustly, reliably, and in a configurable manner. We identify and evaluate the possible choices in the critical steps of unsupervised rPPG methodologies. We evaluate a state-of-the-art processing pipeline in six different datasets, incorporating important corrections in the methodology that ensure reproducible and fair comparisons. In addition, we extend the pipeline by proposing three novel ideas; 1) a new method to stabilize the detected face based on a rigid mesh normalization; 2) a new method to dynamically select the different regions in the face that provide the best raw signals, and 3) a new RGB to rPPG transformation method called Orthogonal Matrix Image Transformation (OMIT) based on QR decomposition, that increases robustness against compression artifacts. We show that all three changes introduce noticeable improvements in retrieving rPPG signals from faces, obtaining state-of-the-art results compared with unsupervised, non-learning-based methodologies, and in some databases, very close to supervised, learning-based methods. We perform a comparative study to quantify the contribution of each proposed idea. In addition, we depict a series of observations that could help in future implementations.Comment: 20 pages, 10 figure
    corecore